# load necessary packages
library(tidyverse)
── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.3
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   1.0.2     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.5.0
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(mosaic)
Loading required package: lattice
Loading required package: ggformula
Loading required package: ggstance

Attaching package: ‘ggstance’

The following objects are masked from ‘package:ggplot2’:

    geom_errorbarh, GeomErrorbarh


New to ggformula?  Try the tutorials: 
    learnr::run_tutorial("introduction", package = "ggformula")
    learnr::run_tutorial("refining", package = "ggformula")
Loading required package: mosaicData
Loading required package: Matrix

Attaching package: ‘Matrix’

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack


The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.

Attaching package: ‘mosaic’

The following object is masked from ‘package:Matrix’:

    mean

The following objects are masked from ‘package:dplyr’:

    count, do, tally

The following object is masked from ‘package:purrr’:

    cross

The following object is masked from ‘package:ggplot2’:

    stat

The following objects are masked from ‘package:stats’:

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test, quantile, sd, t.test, var

The following objects are masked from ‘package:base’:

    max, mean, min, prod, range, sample, sum
library(DataComputing)
library(ggplot2)

Guiding Question

What factors are common between all hall of fame MLB baseball players?

Purpose

This document is required to indicate where various requirements can be found within your Final Project Report Rmd. You must indicate line numbers as they appear in your final Rmd document accompanying each of the following required tasks. Points will be deducted if line numbers are missing or differ signficantly from the submitted Final Rmd document.

Final Project Requirements

Data Access

Description: (1) Analysis includes at least two different data sources. (2) Primary data source may NOT be loaded from an R package–though supporting data may. (3) Access to all data sources is contained within the analysis. (4) Imported data is inspected at beginning of analysis using one or more R functions: e.g., str, glimpse, head, tail, names, nrow, etc

  1. .Rmd Line numbers where at least two different data sources are imported:
HallOfFame <- read_csv("core/HallOfFame.csv")
Parsed with column specification:
cols(
  playerID = col_character(),
  yearID = col_double(),
  votedBy = col_character(),
  ballots = col_double(),
  needed = col_double(),
  votes = col_double(),
  inducted = col_character(),
  category = col_character(),
  needed_note = col_character()
)
AllstarFull <- read_csv("core/AllstarFull.csv")
Parsed with column specification:
cols(
  playerID = col_character(),
  yearID = col_double(),
  gameNum = col_double(),
  gameID = col_character(),
  teamID = col_character(),
  lgID = col_character(),
  GP = col_double(),
  startingPos = col_double()
)
Salaries <- read_csv("core/Salaries.csv")
Parsed with column specification:
cols(
  yearID = col_double(),
  teamID = col_character(),
  lgID = col_character(),
  playerID = col_character(),
  salary = col_double()
)
Batting <- read_csv("core/Batting.csv")
Parsed with column specification:
cols(
  .default = col_double(),
  playerID = col_character(),
  teamID = col_character(),
  lgID = col_character(),
  IBB = col_logical(),
  HBP = col_logical(),
  SH = col_logical(),
  SF = col_logical()
)
See spec(...) for full column specifications.
87292 parsing failures.
 row col           expected actual               file
1999 HBP 1/0/T/F/TRUE/FALSE      2 'core/Batting.csv'
2001 HBP 1/0/T/F/TRUE/FALSE      2 'core/Batting.csv'
2020 HBP 1/0/T/F/TRUE/FALSE      2 'core/Batting.csv'
2022 HBP 1/0/T/F/TRUE/FALSE      2 'core/Batting.csv'
2027 HBP 1/0/T/F/TRUE/FALSE      5 'core/Batting.csv'
.... ... .................. ...... ..................
See problems(...) for more details.
  1. .Rmd Line numbers for inspecting data intake:
head(HallOfFame)
glimpse(HallOfFame)
head(Salaries)
glimpse(Salaries)
head(Batting)

Data Wrangling (5 out of 8 required)

Description: Students need not use every function and method introduced in STAT 184, but clear demonstration of proficiency should include proper use of 5 out of the following 8 topics from class: (+) various data verbs for general data wrangling like filter, mutate, summarise, arrange, group_by, etc. (+) joins for multiple data tables. (+) spread & gather to stack/unstack variables (+) regular expressions (+) reduction and/or transformation functions like mean, sum, max, min, n(), rank, pmin, etc. (+) user-defined functions (+) loops and control flow (+) machine learning

  1. .Rmd Line number(s) for general data wrangling:
InductedP<-
  HallOfFame%>%
  filter(inducted == "Y")%>%
  select(playerID, yearID)
InductedP
Money<-
  Salaries%>%
  select(teamID, playerID, salary, yearID)
Money
  1. .Rmd Line number(s) for a join operation:
HallMoney<-
  InductedP%>%
  inner_join(Money, by = c("playerID" = "playerID"))
HallMoney
#
AvgHS<- 
  HallMoney%>%
  group_by(playerID)%>%
  mutate(Salary = mean(salary))
AvgHS
#Players in the whole league
WholeLeague <-
  Batting %>%
  filter(G > 20)%>%
  select(playerID, yearID)
WholeLeague
OnlyL<-
  WholeLeague%>%
  select(playerID, yearID)%>%
  inner_join(HallOfFame, by =c("playerID" = "playerID"))%>%
  filter(inducted == "N")
OnlyL
#All salaries for players in the league  
WLeagueS<-
  OnlyL%>%
  inner_join(Money, by =c("playerID" = "playerID"))
WLeagueS
#Average League salary for each player
AvgS<-
  WLeagueS%>%
  group_by(playerID)%>%
  summarise(Salary = mean(salary))
AvgS

This join operation will join the HallOfFame table with the AllstarFull table to help us find the correlation between the players that made the Hall Of Fame and played in All Star Games. We can use this joined data set to figure out commonalities between all of the Hall Of Fame baseball players.


# Join the AllstarFull table with the HallOfFame table
AllStarHallOfFameJoin <-
  AllstarFull %>%
  inner_join(HallOfFame, by = c("playerID" = "playerID"))

# Use data wrangling to alter the table to find the amount of times an inducted players played in All Star Games
AllStarHallOfFameTable <-
  AllStarHallOfFameJoin %>%
  filter(inducted == "Y") %>%
  group_by(playerID, inducted) %>%
  summarise(AppearanceCount = n()) %>%
  arrange(desc(AppearanceCount))
  
AllStarHallOfFameTable
AvgHS%>%
ggplot(aes(x = yearID.y, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Year", y = "Salary (millions)")

AvgS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")

The table displayed below uses regular expressions to display the inducted Hall of Fame players who had a batting average of 500 or more in at least 20 games and years that made the achievement. This helps show how good the inducted players are at batting and the years that they did it in.


HallOfFamePlayers <-
  HallOfFame %>%
  filter(inducted == "Y") %>%
  select(playerID, yearID)


HallOfFameBatting <-
  Batting %>%
  inner_join(HallOfFamePlayers, by = c("playerID" = "playerID"))


HallOfFameBatAvg <-
  HallOfFameBatting %>%
  filter(G > 20) %>%
  select(playerID, yearID.x, AB)


HallOfFameOver500 <-
  HallOfFameBatAvg %>%
  extractMatches("^([5-9]{1}[0-9]{2}).*$", AB) %>%
  filter( ! is.na(match1))
HallOfFameOver500

In the graph above, you can notice that the avg homeruns per year of Hall of Famers is lower than the rest of the league. This would indicate that Homeruns are not a crucial statistic in the game of baseball to get you into the Hall of Fame.


AllStarHallOfFameGraphData <-
  AllStarHallOfFameTable %>%
  group_by(AppearanceCount) %>%
  summarise(total = n())

AllStarHallOfFameGraph <-
  AllStarHallOfFameGraphData %>%
  ggplot(aes(x = AppearanceCount, y = total)) +
  geom_bar(stat = "identity", color = "black", fill = "red") +
  geom_hline(aes(yintercept = mean(total)), color = "blue")

AllStarHallOfFameGraph
#Batting stats

BattingAndHallofFame <- 
  Batting %>%
  full_join(HallOfFame, by = c("playerID" = "playerID")) %>%
  filter(inducted == "Y")

#Only Take stats with players having more than 20 games
BattingAndHallofFame <-
  BattingAndHallofFame %>%
    filter(G > 20) 

HallOfFamerHomies <-
  BattingAndHallofFame %>%
    select(yearID.x, H) %>%
    group_by(yearID.x) %>%
    summarise(AvgHsHOF = mean(H)) %>%
    rename(yearID = yearID.x)

EverybodyBatting <-
  Batting %>%
    filter(G > 20) %>%
    summarise(AvgHomeRunPerPlayerEverybody = mean(H))

GraphHomeRuns <-
  Batting %>%
    select(yearID, G, H) %>%
    group_by(yearID) %>%
    filter(G > 20) %>%
    summarise(AvgHsEvery = mean(H))

#Which league has greater HOF chance
LeagueFromAmerican <-
  BattingAndHallofFame %>%
    select(yearID.x, lgID) %>%
    group_by(yearID.x) %>%
    filter(lgID == "AA") %>%
    summarise(American = n()) %>%
    rename(yearID = yearID.x)

LeagueFromNational <-
  BattingAndHallofFame %>%
    select(yearID.x, lgID) %>%
    group_by(yearID.x) %>%
    filter(lgID == "NL") %>%
    summarise(National = n()) %>%
    rename(yearID = yearID.x)

LeagueGraph <-
  LeagueFromAmerican %>%
    full_join(LeagueFromNational, by = c("yearID" = "yearID"))

 
LeagueGraph1 <-
   LeagueGraph %>%
      gather(key = kind, value = Total, American, National)

LeagueGraph1

Graph1 <-
  GraphHomeRuns %>%
    full_join(HallOfFamerHomies, by = c("yearID" = "yearID"))

Graph1.1 <- 
  Graph1 %>%
    gather(key = kind, value = Avg, AvgHsHOF, AvgHsEvery)


ggplot(data=Graph1.1,aes(x=yearID,y=Avg ,fill=kind))+geom_bar(stat='identity',position='stack', width=.9)+ggtitle("HOF Vs Everybody Homeruns")
  

  
  1. .Rmd Line number(s) for a spread or gather operation (or equivalent):

  2. .Rmd Line number(s) for use of regular expressions:

  3. .Rmd Line number(s) for use of reduction and/or transformation functions:

  4. .Rmd Line number(s) for use of user-defined functions:

  5. .Rmd Line number(s) for use of loops and/or control flow:

  6. .Rmd Line number(s) for use of machine learning (not “wrangling” but scored here):

Data Visualization (3 of 5 required)

Description: Students need not use every function and method introduced in STAT 184, but clear demonstration of proficiency should include a range of useful of data visualizations that are (1) relevant to stated research question for the analysis, (2) include at least one effective display of many–at least 3–variables, and (3) include 3 of the following 5 visualization techniques learned in STAT 184: (+) use of multiple geoms such as points, density, lines, segments, boxplots, bar charts, histograms, etc (+) use of multiple aesthetics–not necessarily all in the same graph–such as color, size, shape, x/y position, facets, etc (+) layered graphics such as points and accompanying smoother, points and accompanying boxplots, overlaid density distributions, etc (+) leaflet maps (+) decision tree and/or dendogram displaying machine learning model results

  1. .Rmd Line number(s) for use of mulitple different geoms:

<<<<<<< HEAD


AllStarHallOfFameGraphData <-
  AllStarHallOfFameTable %>%
  group_by(AppearanceCount) %>%
  summarise(total = n())

AllStarHallOfFameGraph <-
  AllStarHallOfFameGraphData %>%
  ggplot(aes(x = AppearanceCount, y = total)) +
  geom_bar(stat = "identity", color = "black", fill = "red") +
  geom_hline(aes(yintercept = mean(total)), color = "blue")

AllStarHallOfFameGraph
AvgHS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")
AvgS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")

======= >>>>>>> eabcccb550c9451f1e1c6844cf07da10fe8e9f03 (B) .Rmd Line number(s) for use of multiple aesthetics:

  1. .Rmd Line number(s) for use of layered graphics:

  2. .Rmd Line number(s) for use of leaflet maps:

  3. .Rmd Line number(s) for use of decision tree or dendogram results:

Other requirements (Nothing for you to report in this Guidance Document)

  1. All data visualizations must be relevant to the stated research question, and the report must include at least one effective display of many–at least 3–variables

  2. Code quality: Code formatting is consistent with Style Guide Appendix of DataComputing eBook. Specifically, all code chunks demonstrate proficiency with (1) meaningful object names (2) proper use of white space especially with respect to infix operators, chain operators, commas, brackets/parens, etc (3) use of <- assignment operator throughout (4) use of meaningful comments.

  3. Narrative quality: The narrative text (1) clearly states one research question that motivates the overall analysis, (2) explains reasoning for each significant step in the analysis and it’s relationship to the research question, (3) explains significant findings and conclusions as they relate to the research question, and (4) is completely free of errors in spelling and grammar

  4. Overall Quality: Submitted project shows significant effort to produce a high-quality and thoughtful analysis that showcases STAT 184 skills. (2) The project must be self-contained, such that the analysis can be entirely rerun without errors. (3) Analysis is coherent, well-organized, and free of extraneous content such as data dumps, unrelated graphs, and other content that is not overtly connected to the research question.

  5. EXTRA CREDIT (1) Project is submitted as a self-contained GitHub Repo (2) project submission is a functioning github.io webpage generated for the project Repo. Note: a link to the GitHub Repo itself will be awarded partial credit, but does not itself qualify as a “webpage” of the analysis.

---
title: "R Notebook"
author: "Group 5, Matthew Hines, Emmanuel Garzo, Dustin Beaver"
date: "4/16/2020"
output: html_notebook
---


```{r}
# load necessary packages
library(tidyverse)
library(mosaic)
library(DataComputing)
library(ggplot2)
```

# Guiding Question

What factors are common between all hall of fame MLB baseball players?


# Purpose

*This document is required to indicate where various requirements can be found within your Final Project Report Rmd.  You must* **indicate line numbers as they appear in your final Rmd document** *accompanying each of the following required tasks. Points will be deducted if line numbers are missing or differ signficantly from the submitted Final Rmd document.*  


# Final Project Requirements


### Data Access

*Description: (1) Analysis includes at least two different data sources. (2) Primary data source may NOT be loaded from an R package--though supporting data may. (3) Access to all data sources is contained within the analysis. (4) Imported data is inspected at beginning of analysis using one or more R functions: e.g., str, glimpse, head, tail, names, nrow, etc*

(A) .Rmd Line numbers where at least two different data sources are imported:  

```{r Loading the Data}

HallOfFame <- read_csv("core/HallOfFame.csv")
AllstarFull <- read_csv("core/AllstarFull.csv")
Salaries <- read_csv("core/Salaries.csv")
Batting <- read_csv("core/Batting.csv")

```

(B) .Rmd Line numbers for inspecting data intake:  
```{r}
head(HallOfFame)
```

```{r}
glimpse(HallOfFame)
```

```{r}
head(Salaries)
```

```{r}
glimpse(Salaries)
```
```{r}
head(Batting)
```
```{r}

```


### Data Wrangling (5 out of 8 required)

*Description: Students need not use every function and method introduced in STAT 184, but clear demonstration of proficiency should include proper use of 5 out of the following 8 topics from class: (+) various data verbs for general data wrangling like filter, mutate, summarise, arrange, group_by, etc. (+) joins for multiple data tables. (+) spread & gather to stack/unstack variables (+) regular expressions (+) reduction and/or transformation functions like mean, sum, max, min, n(), rank, pmin, etc. (+) user-defined functions (+) loops and control flow (+) machine learning*


(A) .Rmd Line number(s) for general data wrangling: 
```{r}

InductedP<-
  HallOfFame%>%
  filter(inducted == "Y")%>%
  select(playerID, yearID)

InductedP
```

```{r}
Money<-
  Salaries%>%
  select(teamID, playerID, salary, yearID)

Money
```


(B) .Rmd Line number(s) for a join operation: 
```{r}
HallMoney<-
  InductedP%>%
  inner_join(Money, by = c("playerID" = "playerID"))
HallMoney
#
AvgHS<- 
  HallMoney%>%
  group_by(playerID)%>%
  mutate(Salary = mean(salary))
AvgHS

#Players in the whole league
WholeLeague <-
  Batting %>%
  filter(G > 20)%>%
  select(playerID, yearID)
WholeLeague

OnlyL<-
  WholeLeague%>%
  select(playerID, yearID)%>%
  inner_join(HallOfFame, by =c("playerID" = "playerID"))%>%
  filter(inducted == "N")
OnlyL

#All salaries for players in the league  
WLeagueS<-
  OnlyL%>%
  inner_join(Money, by =c("playerID" = "playerID"))
WLeagueS

#Average League salary for each player
AvgS<-
  WLeagueS%>%
  group_by(playerID)%>%
  summarise(Salary = mean(salary))
AvgS


```




This join operation will join the HallOfFame table with the AllstarFull table to help us find the correlation between the players that made the Hall Of Fame and played in All Star Games. We can use this joined data set to figure out commonalities between all of the Hall Of Fame baseball players.
```{r}

# Join the AllstarFull table with the HallOfFame table
AllStarHallOfFameJoin <-
  AllstarFull %>%
  inner_join(HallOfFame, by = c("playerID" = "playerID"))

# Use data wrangling to alter the table to find the amount of times an inducted players played in All Star Games
AllStarHallOfFameTable <-
  AllStarHallOfFameJoin %>%
  filter(inducted == "Y") %>%
  group_by(playerID, inducted) %>%
  summarise(AppearanceCount = n()) %>%
  arrange(desc(AppearanceCount))
  
AllStarHallOfFameTable

```


```{r}

AvgHS%>%
ggplot(aes(x = yearID.y, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Year", y = "Salary (millions)")

```


```{r}

AvgS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")

```

The table displayed below uses regular expressions to display the inducted Hall of Fame players who had a batting average of 500 or more in at least 20 games and years that made the achievement. This helps show how good the inducted players are at batting and the years that they did it in.
```{r}

HallOfFamePlayers <-
  HallOfFame %>%
  filter(inducted == "Y") %>%
  select(playerID, yearID)


HallOfFameBatting <-
  Batting %>%
  inner_join(HallOfFamePlayers, by = c("playerID" = "playerID"))


HallOfFameBatAvg <-
  HallOfFameBatting %>%
  filter(G > 20) %>%
  select(playerID, yearID.x, AB)


HallOfFameOver500 <-
  HallOfFameBatAvg %>%
  extractMatches("^([5-9]{1}[0-9]{2}).*$", AB) %>%
  filter( ! is.na(match1))
HallOfFameOver500

```


In the graph above, you can notice that the avg homeruns per year of Hall of Famers is lower than the rest of the league. This would indicate that Homeruns are not a crucial statistic in the game of baseball to get you into the Hall of Fame.
```{r}

AllStarHallOfFameGraphData <-
  AllStarHallOfFameTable %>%
  group_by(AppearanceCount) %>%
  summarise(total = n())

AllStarHallOfFameGraph <-
  AllStarHallOfFameGraphData %>%
  ggplot(aes(x = AppearanceCount, y = total)) +
  geom_bar(stat = "identity", color = "black", fill = "red") +
  geom_hline(aes(yintercept = mean(total)), color = "blue")

AllStarHallOfFameGraph

```



```{r}
#Batting stats

BattingAndHallofFame <- 
  Batting %>%
  full_join(HallOfFame, by = c("playerID" = "playerID")) %>%
  filter(inducted == "Y")

#Only Take stats with players having more than 20 games
BattingAndHallofFame <-
  BattingAndHallofFame %>%
    filter(G > 20) 

HallOfFamerHomies <-
  BattingAndHallofFame %>%
    select(yearID.x, H) %>%
    group_by(yearID.x) %>%
    summarise(AvgHsHOF = mean(H)) %>%
    rename(yearID = yearID.x)

EverybodyBatting <-
  Batting %>%
    filter(G > 20) %>%
    summarise(AvgHomeRunPerPlayerEverybody = mean(H))

GraphHomeRuns <-
  Batting %>%
    select(yearID, G, H) %>%
    group_by(yearID) %>%
    filter(G > 20) %>%
    summarise(AvgHsEvery = mean(H))

#Which league has greater HOF chance
LeagueFromAmerican <-
  BattingAndHallofFame %>%
    select(yearID.x, lgID) %>%
    group_by(yearID.x) %>%
    filter(lgID == "AA") %>%
    summarise(American = n()) %>%
    rename(yearID = yearID.x)

LeagueFromNational <-
  BattingAndHallofFame %>%
    select(yearID.x, lgID) %>%
    group_by(yearID.x) %>%
    filter(lgID == "NL") %>%
    summarise(National = n()) %>%
    rename(yearID = yearID.x)

LeagueGraph <-
  LeagueFromAmerican %>%
    full_join(LeagueFromNational, by = c("yearID" = "yearID"))

 
LeagueGraph1 <-
   LeagueGraph %>%
      gather(key = kind, value = Total, American, National)

LeagueGraph1

Graph1 <-
  GraphHomeRuns %>%
    full_join(HallOfFamerHomies, by = c("yearID" = "yearID"))

Graph1.1 <- 
  Graph1 %>%
    gather(key = kind, value = Avg, AvgHsHOF, AvgHsEvery)


ggplot(data=Graph1.1,aes(x=yearID,y=Avg ,fill=kind))+geom_bar(stat='identity',position='stack', width=.9)+ggtitle("HOF Vs Everybody Homeruns")
  

  

```



(C) .Rmd Line number(s) for a spread or gather operation (or equivalent):


(D) .Rmd Line number(s) for use of regular expressions: 


(E) .Rmd Line number(s) for use of reduction and/or transformation functions: 


(F) .Rmd Line number(s) for use of user-defined functions: 


(G) .Rmd Line number(s) for use of loops and/or control flow: 


(H) .Rmd Line number(s) for use of machine learning (not "wrangling" but scored here): 



### Data Visualization (3 of 5 required)

*Description: Students need not use every function and method introduced in STAT 184, but clear demonstration of proficiency should include a range of useful of data visualizations that are (1) relevant to stated research question for the analysis, (2) include at least one effective display of many--at least 3--variables, and (3) include 3 of the following 5 visualization techniques learned in STAT 184: (+) use of multiple geoms such as points, density, lines, segments, boxplots, bar charts, histograms, etc (+) use of multiple aesthetics--not necessarily all in the same graph--such as color, size, shape, x/y position, facets, etc (+) layered graphics such as points and accompanying smoother, points and accompanying boxplots, overlaid density distributions, etc (+) leaflet maps (+) decision tree and/or dendogram displaying machine learning model results*




(A) .Rmd Line number(s) for use of mulitple different geoms:

<<<<<<< HEAD
```{r}

AllStarHallOfFameGraphData <-
  AllStarHallOfFameTable %>%
  group_by(AppearanceCount) %>%
  summarise(total = n())

AllStarHallOfFameGraph <-
  AllStarHallOfFameGraphData %>%
  ggplot(aes(x = AppearanceCount, y = total)) +
  geom_bar(stat = "identity", color = "black", fill = "red") +
  geom_hline(aes(yintercept = mean(total)), color = "blue")

AllStarHallOfFameGraph

```

```{r}
AvgHS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")
```
```{r}
AvgS%>%
ggplot(aes(x = playerID, y = Salary, color = Salary))+
   geom_point()+
   geom_smooth()+
  labs(x = "Player", y = "Salary (millions)")
```


=======
>>>>>>> eabcccb550c9451f1e1c6844cf07da10fe8e9f03
(B) .Rmd Line number(s) for use of multiple aesthetics:  

(C) .Rmd Line number(s) for use of layered graphics:  

(D) .Rmd Line number(s) for use of leaflet maps:  

(E) .Rmd Line number(s) for use of decision tree or dendogram results:    




### Other requirements (Nothing for you to report in this Guidance Document)

(A) *All data visualizations* must be relevant to the stated research question, and the report must include at least one effective display of many--at least 3--variables 

(B) *Code quality:* Code formatting is consistent with Style Guide Appendix of DataComputing eBook. Specifically, all code chunks demonstrate proficiency with (1) meaningful object names (2) proper use of white space especially with respect to infix operators, chain operators, commas, brackets/parens, etc (3) use of `<-` assignment operator throughout (4) use of meaningful comments.

(C) *Narrative quality:* The narrative text (1) clearly states one research question that motivates the overall analysis, (2) explains reasoning for each significant step in the analysis and it's relationship to the research question, (3) explains significant findings and conclusions as they relate to the research question, and (4) is completely free of errors in spelling and grammar

(D) *Overall Quality:* Submitted project shows significant effort to produce a high-quality and thoughtful analysis that showcases STAT 184 skills. (2) The project must be self-contained, such that the analysis can be entirely rerun without errors. (3) Analysis is coherent, well-organized, and free of extraneous content such as data dumps, unrelated graphs, and other content that is not overtly connected to the research question.

(E) *EXTRA CREDIT* (1) Project is submitted as a self-contained GitHub Repo (2) project submission is a functioning github.io webpage generated for the project Repo. Note: a link to the GitHub Repo itself will be awarded partial credit, but does not itself qualify as a "webpage" of the analysis.
